Introduction

Video games are about learning to make good decisions. I always ejoyed the liberty they offer through the choices available.

Mario Kart was one of my first game I played as a kid, then on SNES and now on Switch. Back then the was was simple between big and heavy donkey kong or the sneaky Toad, now we get to chose the character,but also kart parts : kart frame, tires and glider (yes, mario kart is also about glidding now).

So, Is there any best combination ?

Libraries

This project recquired to use the libraries listed below :

library(tidyverse)
library(readxl)
library(knitr)
library(kableExtra)
library(skimr)
library(GGally)
library(gtools) # Generate combination
library(patchwork)
library(magick)
library(png)
library(ggridges)
library(plotly)
library(ggimage)

Data

Source

A quick reddit query I found a dataset (https://docs.google.com/spreadsheets/d/1g7A-38tn9UAIbB2B3sZI-MpILsS3ZS870UTVMRRxh4Q/edit#gid=0) based on in-game experiments filled by Luigi_Fan2’s (https://twitter.com/Luigi_Fan2).

In its annotations, we can see that there are many redundancies in the karts and parts. Please, refer to this table if you want equivalent stats but different skins.

Import Data

Instead of directly importing data from google drive, I built an excel file to ease up the wrangling process. The modified excel file is downloadable here. I organized the data on several sheet depending on their type (Character, Kart…)

library(readxl)
Char<- read_excel("Mario Kart 8 Deluxe Stat.xlsx",sheet = "Char",col_names = T, col_types = c("text",rep("numeric", 12)))

Kart <- read_excel("Mario Kart 8 Deluxe Stat.xlsx",sheet = "Kart",col_names = T, col_types = c("text",rep("numeric", 12)))

Tire <- read_excel("Mario Kart 8 Deluxe Stat.xlsx",sheet = "Tire",col_names = T, col_types = c("text",rep("numeric", 12)))

Glide <- read_excel("Mario Kart 8 Deluxe Stat.xlsx",sheet = "Glide",col_names = T, col_types = c("text",rep("numeric", 12)))

´Type´ column is added in each dataset before joining all of them :

Char<-Char %>% add_column(Type="Char", .after='ID')
Kart <- Kart %>% add_column(Type="Kart", .after='ID')
Tire <- Tire %>% add_column(Type="Tire", .after='ID')
Glide <- Glide %>% add_column(Type="Glide", .after='ID')

data1 <-  full_join(Char, Kart) %>% full_join(Glide) %>% full_join(Tire) #join datatables

Variables

Each character and item has its own stats that adds up on 5 jauges :

  • Speed : Maximum speed,
  • Acceleration : Time to get to maximum speed
  • Weight : Higher weight can push lower weights out of the way
  • Handling : General kart handling for turns and drifts
  • Traction : Higher traction means the kart is less affected when out of track
  • M-Turbo : Higher M-Turbo results in longer and faster boost but also takes less time to stack up

IMAGE STAT

Dimension

44 characters 36 Karts 21 Tires 14 Gliders

IMAGE mariokart builder

Meet the data

Variables Summmaries and Distribution

Let’s take a glimpse at our data. Since, characters have base caracteristics and parts are only modificators (bonus and malus), 2 tables are generated :

data1 %>% filter(Type=="Char") %>% select_if(is.numeric) %>% skim_to_wide() %>% kable("html") %>% kable_styling(bootstrap_options = "striped")
type variable missing complete n mean sd p0 p25 median p75 p100 hist
numeric Accel 0 16 16 3.69 0.44 3 3.25 3.75 4 4.25 ▃▆▁▃▃▁▇▆
numeric Handling_Anti-G 0 16 16 3.8 0.73 2.5 3.25 3.75 4.31 5 ▃▂▃▇▂▃▃▃
numeric Handling_Gliding 0 16 16 3.8 0.73 2.5 3.25 3.75 4.31 5 ▃▂▃▇▂▃▃▃
numeric Handling_Land 0 16 16 3.8 0.73 2.5 3.25 3.75 4.31 5 ▃▂▃▇▂▃▃▃
numeric Handling_Water 0 16 16 3.3 0.73 2 2.75 3.25 3.81 4.5 ▃▂▃▇▂▃▃▃
numeric M-turbo 0 16 16 3.41 0.4 2.75 3.19 3.5 3.75 4 ▃▃▁▆▇▁▆▃
numeric Speed_Anti-G 0 16 16 3.22 0.84 2 2.5 3.25 3.81 4.5 ▇▅▂▇▅▂▂▇
numeric Speed_Gliding 0 16 16 3.97 0.84 2.75 3.25 4 4.56 5.25 ▇▅▂▇▅▂▂▇
numeric Speed_Land 0 16 16 3.47 0.84 2.25 2.75 3.5 4.06 4.75 ▇▅▂▇▅▂▂▇
numeric Speed_Water 0 16 16 3.72 0.84 2.5 3 3.75 4.31 5 ▇▅▂▇▅▂▂▇
numeric Traction 0 16 16 3.59 0.41 3 3.25 3.62 3.81 4.25 ▃▇▁▃▇▁▃▃
numeric Weight 0 16 16 3.19 0.85 2 2.5 3.12 3.81 4.5 ▇▅▅▅▅▂▂▇

Note that :

  • Handling and Speed are different on the type of environment : soil, glide, anti-gravity and water.
  • Values are distributed between 2 and 5
  • None of the variables are following normal distribution
  • No value is missing
data1 %>% filter(Type!="Char") %>% select_if(is.numeric) %>% skim_to_wide() %>% kable("html") %>% kable_styling(bootstrap_options = "striped")
type variable missing complete n mean sd p0 p25 median p75 p100 hist
numeric Accel 0 27 27 -0.15 0.49 -1 -0.5 0 0.25 0.75 ▃▅▇▅▇▇▅▂
numeric Handling_Anti-G 0 27 27 -0.065 0.31 -0.75 -0.25 0 0.12 0.5 ▁▂▁▆▇▁▅▂
numeric Handling_Gliding 0 27 27 -0.16 0.33 -0.75 -0.5 0 0 0.25 ▂▅▁▃▁▇▁▅
numeric Handling_Land 0 27 27 -0.065 0.34 -0.75 -0.25 0 0.25 0.5 ▂▂▁▅▇▁▅▂
numeric Handling_Water 0 27 27 0.065 0.38 -0.75 -0.12 0 0.25 0.75 ▁▃▃▇▁▇▃▂
numeric M-turbo 0 27 27 -0.056 0.47 -1 -0.25 0 0.25 0.75 ▁▃▂▇▇▇▃▂
numeric Speed_Anti-G 0 27 27 0.0093 0.31 -0.75 -0.12 0 0.25 0.5 ▁▂▁▃▇▁▅▂
numeric Speed_Gliding 0 27 27 -0.19 0.28 -0.75 -0.38 -0.25 0 0.25 ▂▃▁▆▁▇▁▂
numeric Speed_Land 0 27 27 -0.019 0.35 -0.75 -0.25 0 0.25 0.5 ▁▃▁▇▇▁▇▅
numeric Speed_Water 0 27 27 -0.12 0.29 -0.75 -0.25 -0.25 0 0.5 ▁▂▁▇▇▁▂▂
numeric Traction 0 27 27 -0.046 0.52 -1.25 -0.25 0 0.25 1 ▂▂▂▅▇▆▃▂
numeric Weight 0 27 27 -0.0093 0.34 -0.5 -0.25 0 0.25 0.5 ▆▆▁▇▁▇▁▅

Note that :

  • Kart parts are also affected by environment.
  • Values are distributed between -1 and +0.75
  • No value is missing

For now, mean speed and mean handling are used, we go into details later.

data<- data1 %>%mutate(Speed_mean=rowMeans(data1[grepl("Speed", names(data1))])) %>% mutate(Handling_mean=rowMeans(data1[grepl("Handling", names(data1))]))

data %>% filter(Type!="Char")%>% select(contains("mean")) %>% skim_to_wide() %>% kable("html") %>% kable_styling(bootstrap_options = "striped")
type variable missing complete n mean sd p0 p25 median p75 p100 hist
numeric Handling_mean 0 27 27 -0.056 0.27 -0.62 -0.19 0 0.16 0.44 ▂▃▁▂▇▂▅▁
numeric Speed_mean 0 27 27 -0.079 0.15 -0.5 -0.16 -0.062 0 0.19 ▁▁▁▂▁▇▂▁

Variables Correlations

I generally like to check correlations between variables in a dataset. For this I use the GGally package.

It looks like Most variables are highly correlated :

  • Acceleration, Traction, Handling and Mini-Turbo are positively correlated together
  • Speed and Weight are correlated together but negativaly to others
data %>% filter(Type=="Char") %>% select_if(is.numeric) %>% select(-(matches('Land|Gliding|Water|Anti'))) %>% ggpairs()

A nice way to summarize all these correlations and observe their relations is to plot a parallel coordinates (or sankey chart). Also available in GGally package, that’s or luck !

data %>% 
  filter(Type=="Char") %>% 
  select_if(is.numeric) %>% 
  select(-(matches('Land|Gliding|Water|Anti'))) %>% 
  select(Weight, Speed_mean, `M-turbo`,Traction,Accel,Handling_mean)%>% #Correlated variables are put close to each others
  ggparcoord(scale='globalminmax', groupColumn = "Weight")+
  scale_color_viridis_c(direction = -1,begin = 0.3) +
  theme(panel.grid.minor.y = element_blank(), panel.grid.major.x=element_line(color="grey60", linetype = 3),axis.ticks = element_blank(), panel.background = element_blank(), axis.text.x = element_text(angle=15), axis.title = element_blank()) + scale_x_discrete(position="top") + labs(y="Value")

Questions and Hypothesis

Above gameplay preferences (light or heavy characters), is there any best combination ?

Best combination is considered as effective in any situation, thus, speed and acceleration should be balanced.

The parameters that need to be optimized (i.e. higher values are the best) are :

Weight is put aside because as we seen, it is not considered as a performance but as type of gameplay or preference.

My first hypothesis was that the best combination would come from :

Analysis

Combinations computing

First, I tried to identify best Characters/Kart parts separatly, but I realised that if I could compute all the combination, finding the best combo would be the best way to get a clean answer

comb <- as_data_frame(permutations(n=length(data1$ID),r=4,v=data1$ID, repeats.allowed = F)) %>% 
  filter(V1 %in% Char$ID & V2 %in% Kart$ID & V3 %in% Tire$ID & V4 %in% Glide$ID) #Generates all the combinations

#following lines merge extracts stats for each characters or parts in differents tables
names(comb)[1] <- "ID"
comb1 <- inner_join(comb, data, by="ID")

names(comb)[1] <- "V1"
names(comb)[2] <- "ID"
comb2 <- inner_join(comb, data, by="ID")

names(comb)[2] <- "V2"
names(comb)[3] <- "ID"
comb3 <- inner_join(comb, data, by="ID")

names(comb)[3] <- "V3"
names(comb)[4] <- "ID"
comb4 <- inner_join(comb, data, by="ID")

data_combo <- comb1[,6:19]+comb2[,6:19]+comb3[,6:19]+comb4[,6:19]

data_combo <- cbind(comb[,1:4], data_combo)

#Computation of combinations stats
data_combo <- comb1[,6:19]+comb2[,6:19]+comb3[,6:19]+comb4[,6:19]

data_combo <- cbind(comb[,1:4], data_combo)

names(data_combo)[1]<-"Char"
names(data_combo)[2]<-"Kart"
names(data_combo)[3]<-"Tire"
names(data_combo)[4]<-"Glidder"

data_combo$ID <- seq_len(nrow(data_combo))

#Mean Handling and Accelarations are recomputed and a Acceleration / Speed variable is added
data_combo  <- data_combo  %>%mutate(Speed_mean=rowMeans(data_combo [grepl("Speed", names(data_combo ))])) %>% mutate(Handling_mean=rowMeans(data_combo [grepl("Handling", names(data_combo ))]))

More variables !

I decided to add few helpfull variables:

  • Weighted Speed and Handling that gives more importance (twice) to Land and Anti-G (These kind of tracks seems to be the more frequent. I could not find any actual data on that but a 2 factor seems appropriate)
  • As weight is not a performance, weights are transformed in categories
  • A Total performance indicator is computed by summing :
    • Mean Speed
    • Acceleration
#Land Speed and Handling are weighted (twice more important) for mean calculation since most part of the races happens on regular soil.
data_combo  <- data_combo %>%
  rowwise() %>%
  mutate(Speed_mean=weighted.mean(x=c(Speed_Land,`Speed_Anti-G`, Speed_Water, Speed_Gliding), w=c(2,2,1,1))) %>%
  rowwise() %>%
  mutate(Handling_mean=weighted.mean(x=c(Handling_Land,`Handling_Anti-G`, Handling_Water, Handling_Gliding), w=c(2,2,1,1)))


#Weight categories
data_combo <- data_combo %>% mutate(w_cat = cut(Weight,breaks = c(0,2,3,4,5,6),labels = c("Super Light","Light","Medium","Heavy", "Super Heavy")))#Create Weigh categories

#Total_perf computing
data_combo <- data_combo %>% mutate(total_perf=as.numeric(Accel+Speed_mean)) 

Combinations summary

data_combo %>% ungroup() %>% 
  select(-ID) %>% 
  select_if(is.numeric) %>% 
  skim_to_wide() %>% 
  kable("html") %>% 
  kable_styling()
type variable missing complete n mean sd p0 p25 median p75 p100 hist
numeric Accel 0 8064 8064 3.4 0.83 1 2.75 3.5 4 5.75 ▁▂▇▇▇▇▂▁
numeric Handling_Anti-G 0 8064 8064 3.53 0.84 1 3 3.5 4.25 5.75 ▁▁▆▆▆▇▂▁
numeric Handling_Gliding 0 8064 8064 3.5 0.85 1 3 3.5 4 5.75 ▁▂▆▆▇▇▂▁
numeric Handling_Land 0 8064 8064 3.61 0.85 1 3 3.75 4.25 5.75 ▁▁▅▆▆▇▃▁
numeric Handling_mean 0 8064 8064 3.54 0.81 1.12 2.96 3.54 4.12 5.67 ▁▂▅▇▇▆▃▁
numeric Handling_Water 0 8064 8064 3.43 0.85 1 2.75 3.5 4 5.75 ▁▂▇▇▇▇▂▁
numeric M-turbo 0 8064 8064 3.35 0.8 1 2.75 3.25 4 5.75 ▁▂▇▇▇▇▂▁
numeric Speed_Anti-G 0 8064 8064 3.37 0.93 0.75 2.75 3.25 4 5.75 ▁▂▅▇▆▆▂▁
numeric Speed_Gliding 0 8064 8064 3.43 0.92 1 2.75 3.5 4.25 5.75 ▁▂▇▆▆▇▃▁
numeric Speed_Land 0 8064 8064 3.36 0.96 0.75 2.75 3.25 4 5.75 ▁▃▅▇▅▆▂▁
numeric Speed_mean 0 8064 8064 3.37 0.86 1.38 2.67 3.38 4.04 5.21 ▁▅▇▇▇▇▆▂
numeric Speed_Water 0 8064 8064 3.33 0.92 1 2.5 3.25 4 5.75 ▁▃▇▆▆▇▂▁
numeric total_perf 0 8064 8064 6.77 0.68 4.38 6.33 6.79 7.25 8.42 ▁▁▂▅▇▇▅▁
numeric Traction 0 8064 8064 3.31 0.84 0.75 2.75 3.25 4 5.75 ▁▂▃▇▆▆▁▁
numeric Weight 0 8064 8064 3.18 0.98 0.75 2.5 3.25 4 5.75 ▁▃▅▇▅▆▂▁

This table tells us that the maximum (p100) total performance we can get is around 8.42. What are these combinations ?

data_combo %>% ungroup()%>% top_n(3,wt = total_perf) %>% arrange(desc(total_perf)) %>% kable("html") %>% kable_styling()
Char Kart Tire Glidder Speed_Land Speed_Anti-G Speed_Water Speed_Gliding Accel Weight Handling_Land Handling_Anti-G Handling_Water Handling_Gliding Traction M-turbo Speed_mean Handling_mean ID w_cat total_perf
Bowser Biddybuggy Roller Cloud Glider 3.25 4.00 4.50 4.50 4.50 3.25 3.25 3.25 2.75 3.25 3.00 4.50 3.916667 3.166667 1565 Medium 8.416667
Waluigi Biddybuggy Roller Cloud Glider 3.00 3.75 4.25 4.25 4.75 2.75 3.75 3.75 3.25 3.75 3.00 4.75 3.666667 3.666667 7109 Light 8.416667
Wario Biddybuggy Roller Cloud Glider 3.25 4.00 4.50 4.50 4.50 3.00 3.50 3.50 3.00 3.50 3.25 4.50 3.916667 3.416667 7613 Light 8.416667

According to our definition of best combination,

Best overall combination ?

ggplot(data_combo, aes(Speed_mean, Accel, label=paste(round(total_perf,2),round(Speed_mean,2), round(Accel,2),"\n",Char,"+",Kart,"+", Tire,"+", Glidder)))+
  geom_point(pch=21, alpha=0.95, aes(fill=`M-turbo`, size=Handling_mean))+
  viridis::scale_fill_viridis(direction=1)+
  geom_abline(intercept=0, slope=1,size=1.5)+
  geom_vline(xintercept=4)+
  geom_hline(yintercept=4)+
ggrepel::geom_label_repel(data=(data_combo %>% filter(Speed_mean >= 4 & Accel >= 4)%>% top_n(3,total_perf)), aes(x=Speed_mean, y=Accel,label=paste(round(total_perf,2),round(Speed_mean,2), round(Accel,2),"\n",Char,"+",Kart,"+", Tire,"+", Glidder)), size=2.6, segment.colour = "black",force=9,point.padding = unit(0.5, "lines"), box.padding = 0, color="black", fill="white", alpha=0.85, segment.size = 1, nudge_y = 0.5)+
  facet_wrap(~w_cat)

Comparative Chart of top_n

Problem here is that Speed_Land is low. Speen_mean was compensated by the High Speed_Water

Plot with transparent image (limited loop)

MarioK_img <- image_read("Mario Kart.png")
image_info(MarioK_img)
##   format width height colorspace matte filesize density
## 1    PNG  1678    957       sRGB  TRUE  1686733   37x37
noms <- c("Mario","Luigi","Peach","Daisy","Rosalina","Tanooki Mario","Cat Peach","Yoshi",paste("Yoshi",2:9, sep="_"),"Toad","Koopa Troopa","Shy Guy",paste("Shy Guy",2:9, sep="_"),"Lakitu","Toadette","King Boo","Baby Mario", "Baby Luigi", "Baby Peach","Baby Daisy", "Baby Rosalina","Metal Mario", "Gold Mario","Pink Gold Peach","Wario","Waluigi","Donkey Kong","Bowser","Dry Bones","Bowser Jr.","Dry Bowser","Lemmy","Larry","Wendy","Ludwig","Iggy","Roy","Morton","Inkling Girl",paste("Inkling Girl",2:3,sep="_"),"Inkling Boy",paste("Inkling Boy",1:2,sep="_"),"Link","Male Villager","Female Villager","Isabelle")

  
for (i in c(1:25)){
for (j in c(rep(752,1))) {
  name <- noms[i]
  m <- image_write(MarioK_img, format="png")
  m <- readPNG(m, F)
  m[,,4] <- m[,,4]*0.4 # adjust alpha
  m <- image_read(m)
  
  frame <- paste("60x60+",(i-1)*65+2,"+",j, sep="")
  
  assign(x = name ,value = image_crop(m, frame)) %>%
  image_write(.,path = paste("icons/",noms[i],".png", sep=""), format="png")}}

for (i in c(1:25)){
for (j in c(rep(820,1))) {
  name <- noms[i+25]
  m <- image_write(MarioK_img, format="png")
  m <- readPNG(m, F)
  m[,,4] <- m[,,4]*0.4 # adjust alpha
  m <- image_read(m)
  
  frame <- paste("60x60+",(i-1)*65+2,"+",j, sep="")
  
  assign(x = name ,value = image_crop(m, frame)) %>%
  image_write(.,path = paste("icons/",noms[i+25],".png", sep=""), format="png")}}

for (i in c(1:12)){
for (j in c(rep(882,1))) {
  name <- noms[i+50]
  m <- image_write(MarioK_img, format="png")
  m <- readPNG(m, F)
  m[,,4] <- m[,,4]*0.4 # adjust alpha
  m <- image_read(m)
  
  frame <- paste("60x60+",(i-1)*65+2,"+",j, sep="")
  
  assign(x = name ,value = image_crop(m, frame)) %>%
  image_write(.,path = paste("icons/",noms[i+50],".png", sep=""), format="png")}}
data_combo <- data_combo %>% ungroup() %>%mutate(img=sprintf("icons/%s.png",data_combo$Char))
ggplot(data_combo %>% group_by(Char) %>% sample_frac(1), aes(Speed_mean, Accel))+
  geom_image(aes(image=img), size=0.04) + coord_equal() + scale_x_continuous(limits = c(1,5.8))+
  scale_y_continuous(limits = c(1,5.8))+
  ggrepel::geom_label_repel(data=(data_combo %>% top_n(3,total_perf)), aes(x=Speed_mean, y=Accel,group=ID,label=paste("Total:",round(total_perf,2),"\n","Speed:",round(Speed_mean,2),"Accel:", round(Accel,2),"\n",Char,"+",Kart,"+", Tire,"+", Glidder)), size=1, segment.colour = "black",point.padding = unit(0.1, "lines"), color="black", fill="white", alpha=0.85, segment.size = 0.2, min.segment.length = 0, direction = "y", nudge_x = 2, force=4,label.padding = 0.1)+
  theme_classic()+
  facet_wrap(~w_cat,ncol=1)+
  geom_abline(alpha=0.2)

Best combinations for Specifics characters

#Yoshi
ggplot(data_combo %>%filter(Char=="Peach"), aes(Speed_mean, Accel))+
  geom_point(pch=21, alpha=0.85, aes(fill=`M-turbo`, size=Handling_mean))+
  ggrepel::geom_label_repel(data=(data_combo %>%filter(Char=="Peach", Speed_mean >= 3 & Accel >= 3) %>% top_n(2, total_perf)), aes(x=Speed_mean, y=Accel,label=paste(total_perf,"\n",Kart,"+", Tire,"+", Glidder,"\n", w_cat)), size=2.6, segment.colour = "black",force=5,point.padding = unit(0.5, "lines"), box.padding = 1.5, color="black", fill="white", alpha=0.87, direction = "y")+
  scale_fill_distiller(direction=1, palette="Greens")

#Metal Mario
ggplot(data_combo %>%filter(Char=="Metal Mario"), aes(Speed_mean, Accel))+
  geom_point(pch=21, alpha=0.85, aes(fill=`M-turbo`, size=Handling_mean))+
  ggrepel::geom_label_repel(data=(data_combo %>%filter(Char=="Peach", Speed_mean >= 3 & Accel >= 3) %>% top_n(2, total_perf)), aes(x=Speed_mean, y=Accel,label=paste(total_perf,"\n",Kart,"+", Tire,"+", Glidder,"\n", w_cat)), size=2.6, segment.colour = "black",force=5,point.padding = unit(0.5, "lines"), box.padding = 1.5, color="black", fill="white", alpha=0.87, direction = "y")+
  scale_fill_distiller(direction=1, palette="Greens")

#Tanooki Mario
ggplotly(ggplot(data_combo %>%filter(Char=="Tanooki Mario"), aes(Speed_mean, Accel))+
  geom_point(pch=21, alpha=0.85, aes(fill=`M-turbo`, size=Handling_mean,label=Kart, label2=Tire, Label3=Glidder), fill="grey80")+
  ggrepel::geom_label_repel(data=(data_combo %>%filter(Char=="Peach", Speed_mean >= 3 & Accel >= 3) %>% top_n(2, total_perf)), aes(x=Speed_mean, y=Accel,label=paste(total_perf,"\n",Kart,"+", Tire,"+", Glidder,"\n", w_cat)), size=2.6, segment.colour = "black",force=5,point.padding = unit(0.5, "lines"), box.padding = 1.5, color="black", fill="white", alpha=0.87, direction = "y")+
    geom_abline()+
    geom_hline(yintercept = 4)+
    geom_vline(xintercept = 4))
#Koopa Troopa
ggplotly(ggplot(data_combo %>%filter(Char=="Koopa Troopa"), aes(Speed_mean, Accel))+
  geom_point(pch=21, alpha=0.85, aes(fill=`M-turbo`, size=Handling_mean,label=Kart, label2=Tire, Label3=Glidder), fill="grey80")+
    geom_abline()+
    geom_hline(yintercept = 4)+
    geom_vline(xintercept = 4))
library(plotly)